This report presents a strategic recommendation to assist Julia, the Category Manager, in preparing for an upcoming category review. The analysis focuses on customer purchasing trends and chip-buying behaviors, with particular attention to segmenting customers and identifying key metrics that define their behavior. Utilizing R, with Python as an alternative tool, this study includes data cleaning, outlier detection, and the creation of derived features like pack size and brand name. High-level data summaries and targeted metrics enable the identification of spending drivers across customer segments. The goal is to develop actionable insights with commercial applicability to inform Julia’s strategic decisions effectively.
Warning: package 'readxl' was built under R version 4.4.2
Warning: package 'dplyr' was built under R version 4.4.2
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
# Load the data
transaction_data <- read_excel("data/QVI_transaction_data.xlsx")
customer_data <- read.csv("data/QVI_purchase_behaviour.csv")
# Display the data
transaction_data# Summary statistics for transaction data
summary(transaction_data) # Summary statistics for transaction data DATE STORE_NBR LYLTY_CARD_NBR TXN_ID
Min. :43282 Min. : 1.0 Min. : 1000 Min. : 1
1st Qu.:43373 1st Qu.: 70.0 1st Qu.: 70021 1st Qu.: 67602
Median :43464 Median :130.0 Median : 130358 Median : 135138
Mean :43464 Mean :135.1 Mean : 135550 Mean : 135158
3rd Qu.:43555 3rd Qu.:203.0 3rd Qu.: 203094 3rd Qu.: 202701
Max. :43646 Max. :272.0 Max. :2373711 Max. :2415841
PROD_NBR PROD_NAME PROD_QTY TOT_SALES
Min. : 1.00 Length:264836 Min. : 1.000 Min. : 1.500
1st Qu.: 28.00 Class :character 1st Qu.: 2.000 1st Qu.: 5.400
Median : 56.00 Mode :character Median : 2.000 Median : 7.400
Mean : 56.58 Mean : 1.907 Mean : 7.304
3rd Qu.: 85.00 3rd Qu.: 2.000 3rd Qu.: 9.200
Max. :114.00 Max. :200.000 Max. :650.000
tibble [264,836 × 8] (S3: tbl_df/tbl/data.frame)
$ DATE : num [1:264836] 43390 43599 43605 43329 43330 ...
$ STORE_NBR : num [1:264836] 1 1 1 2 2 4 4 4 5 7 ...
$ LYLTY_CARD_NBR: num [1:264836] 1000 1307 1343 2373 2426 ...
$ TXN_ID : num [1:264836] 1 348 383 974 1038 ...
$ PROD_NBR : num [1:264836] 5 66 61 69 108 57 16 24 42 52 ...
$ PROD_NAME : chr [1:264836] "Natural Chip Compny SeaSalt175g" "CCs Nacho Cheese 175g" "Smiths Crinkle Cut Chips Chicken 170g" "Smiths Chip Thinly S/Cream&Onion 175g" ...
$ PROD_QTY : num [1:264836] 2 3 2 5 3 1 1 1 1 2 ...
$ TOT_SALES : num [1:264836] 6 6.3 2.9 15 13.8 5.1 5.7 3.6 3.9 7.2 ...
[1] 264836
# Summary statistics for customer data
summary(customer_data) # Summary statistics for customer data LYLTY_CARD_NBR LIFESTAGE PREMIUM_CUSTOMER
Min. : 1000 Length:72637 Length:72637
1st Qu.: 66202 Class :character Class :character
Median : 134040 Mode :character Mode :character
Mean : 136186
3rd Qu.: 203375
Max. :2373711
'data.frame': 72637 obs. of 3 variables:
$ LYLTY_CARD_NBR : int 1000 1002 1003 1004 1005 1007 1009 1010 1011 1012 ...
$ LIFESTAGE : chr "YOUNG SINGLES/COUPLES" "YOUNG SINGLES/COUPLES" "YOUNG FAMILIES" "OLDER SINGLES/COUPLES" ...
$ PREMIUM_CUSTOMER: chr "Premium" "Mainstream" "Budget" "Mainstream" ...
[1] 72637
The transaction data contains the following variables:
The customer data contains the following variables:
DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR
0 0 0 0 0
PROD_NAME PROD_QTY TOT_SALES
0 0 0
LYLTY_CARD_NBR LIFESTAGE PREMIUM_CUSTOMER
0 0 0
From the results, there are no missing values in both the transaction and customer data.
# Fix date format in transaction data
transaction_data$DATE <- as.Date(transaction_data$DATE, origin = "1899-12-30")
transaction_data [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
[109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
[127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
[145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
[163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
[181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
[199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
[217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
[235] 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
[253] 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270
[271] 271 272
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
[109] 109 110 111 112 113 114
[1] "Burger Rings 220g"
[2] "CCs Nacho Cheese 175g"
[3] "CCs Original 175g"
[4] "CCs Tasty Cheese 175g"
[5] "Cheetos Chs & Bacon Balls 190g"
[6] "Cheetos Puffs 165g"
[7] "Cheezels Cheese 330g"
[8] "Cheezels Cheese Box 125g"
[9] "Cobs Popd Sea Salt Chips 110g"
[10] "Cobs Popd Sour Crm &Chives Chips 110g"
[11] "Cobs Popd Swt/Chlli &Sr/Cream Chips 110g"
[12] "Dorito Corn Chp Supreme 380g"
[13] "Doritos Cheese Supreme 330g"
[14] "Doritos Corn Chip Mexican Jalapeno 150g"
[15] "Doritos Corn Chip Southern Chicken 150g"
[16] "Doritos Corn Chips Cheese Supreme 170g"
[17] "Doritos Corn Chips Nacho Cheese 170g"
[18] "Doritos Corn Chips Original 170g"
[19] "Doritos Mexicana 170g"
[20] "Doritos Salsa Medium 300g"
[21] "Doritos Salsa Mild 300g"
[22] "French Fries Potato Chips 175g"
[23] "Grain Waves Sweet Chilli 210g"
[24] "Grain Waves Sour Cream&Chives 210G"
[25] "GrnWves Plus Btroot & Chilli Jam 180g"
[26] "Infuzions BBQ Rib Prawn Crackers 110g"
[27] "Infuzions Mango Chutny Papadums 70g"
[28] "Infuzions SourCream&Herbs Veg Strws 110g"
[29] "Infuzions Thai SweetChili PotatoMix 110g"
[30] "Infzns Crn Crnchers Tangy Gcamole 110g"
[31] "Kettle 135g Swt Pot Sea Salt"
[32] "Kettle Chilli 175g"
[33] "Kettle Honey Soy Chicken 175g"
[34] "Kettle Mozzarella Basil & Pesto 175g"
[35] "Kettle Original 175g"
[36] "Kettle Sea Salt And Vinegar 175g"
[37] "Kettle Sensations BBQ&Maple 150g"
[38] "Kettle Sensations Camembert & Fig 150g"
[39] "Kettle Sensations Siracha Lime 150g"
[40] "Kettle Sweet Chilli And Sour Cream 175g"
[41] "Kettle Tortilla ChpsBtroot&Ricotta 150g"
[42] "Kettle Tortilla ChpsFeta&Garlic 150g"
[43] "Kettle Tortilla ChpsHny&Jlpno Chili 150g"
[44] "Natural Chip Compny SeaSalt175g"
[45] "Natural Chip Co Tmato Hrb&Spce 175g"
[46] "Natural ChipCo Hony Soy Chckn175g"
[47] "Natural ChipCo Sea Salt & Vinegr 175g"
[48] "NCC Sour Cream & Garden Chives 175g"
[49] "Old El Paso Salsa Dip Chnky Tom Ht300g"
[50] "Old El Paso Salsa Dip Tomato Med 300g"
[51] "Old El Paso Salsa Dip Tomato Mild 300g"
[52] "Pringles Barbeque 134g"
[53] "Pringles Chicken Salt Crips 134g"
[54] "Pringles Mystery Flavour 134g"
[55] "Pringles Original Crisps 134g"
[56] "Pringles Slt Vingar 134g"
[57] "Pringles SourCream Onion 134g"
[58] "Pringles Sthrn FriedChicken 134g"
[59] "Pringles Sweet&Spcy BBQ 134g"
[60] "Red Rock Deli Chikn&Garlic Aioli 150g"
[61] "Red Rock Deli Sp Salt & Truffle 150G"
[62] "Red Rock Deli SR Salsa & Mzzrlla 150g"
[63] "Red Rock Deli Thai Chilli&Lime 150g"
[64] "RRD Chilli& Coconut 150g"
[65] "RRD Honey Soy Chicken 165g"
[66] "RRD Lime & Pepper 165g"
[67] "RRD Pc Sea Salt 165g"
[68] "RRD Salt & Vinegar 165g"
[69] "RRD SR Slow Rst Pork Belly 150g"
[70] "RRD Steak & Chimuchurri 150g"
[71] "RRD Sweet Chilli & Sour Cream 165g"
[72] "Smith Crinkle Cut Bolognese 150g"
[73] "Smith Crinkle Cut Mac N Cheese 150g"
[74] "Smiths Chip Thinly Cut Original 175g"
[75] "Smiths Chip Thinly CutSalt/Vinegr175g"
[76] "Smiths Chip Thinly S/Cream&Onion 175g"
[77] "Smiths Crinkle Original 330g"
[78] "Smiths Crinkle Chips Salt & Vinegar 330g"
[79] "Smiths Crinkle Cut Chips Barbecue 170g"
[80] "Smiths Crinkle Cut Chips Chicken 170g"
[81] "Smiths Crinkle Cut Chips Chs&Onion170g"
[82] "Smiths Crinkle Cut Chips Original 170g"
[83] "Smiths Crinkle Cut French OnionDip 150g"
[84] "Smiths Crinkle Cut Salt & Vinegar 170g"
[85] "Smiths Crinkle Cut Snag&Sauce 150g"
[86] "Smiths Crinkle Cut Tomato Salsa 150g"
[87] "Smiths Crnkle Chip Orgnl Big Bag 380g"
[88] "Smiths Thinly Swt Chli&S/Cream175G"
[89] "Smiths Thinly Cut Roast Chicken 175g"
[90] "Snbts Whlgrn Crisps Cheddr&Mstrd 90g"
[91] "Sunbites Whlegrn Crisps Frch/Onin 90g"
[92] "Thins Chips Originl saltd 175g"
[93] "Thins Chips Light& Tangy 175g"
[94] "Thins Chips Salt & Vinegar 175g"
[95] "Thins Chips Seasonedchicken 175g"
[96] "Thins Potato Chips Hot & Spicy 175g"
[97] "Tostitos Lightly Salted 175g"
[98] "Tostitos Smoked Chipotle 175g"
[99] "Tostitos Splash Of Lime 175g"
[100] "Twisties Cheese 270g"
[101] "Twisties Cheese Burger 250g"
[102] "Twisties Chicken270g"
[103] "Tyrrells Crisps Ched & Chives 165g"
[104] "Tyrrells Crisps Lightly Salted 165g"
[105] "Woolworths Cheese Rings 190g"
[106] "Woolworths Medium Salsa 300g"
[107] "Woolworths Mild Salsa 300g"
[108] "WW Crinkle Cut Chicken 175g"
[109] "WW Crinkle Cut Original 175g"
[110] "WW D/Style Chip Sea Salt 200g"
[111] "WW Original Corn Chips 200g"
[112] "WW Original Stacked Chips 160g"
[113] "WW Sour Cream &OnionStacked Chips 160g"
[114] "WW Supreme Cheese Corn Chips 200g"
Burger Rings 220g
CCs Nacho Cheese 175g
CCs Original 175g
CCs Tasty Cheese 175g
Cheetos Chs & Bacon Balls 190g
Cheetos Puffs 165g
Cheezels Cheese 330g
Cheezels Cheese Box 125g
Cobs Popd Sea Salt Chips 110g
Cobs Popd Sour Crm &Chives Chips 110g
Cobs Popd Swt/Chlli &Sr/Cream Chips 110g
Dorito Corn Chp Supreme 380g
Doritos Cheese Supreme 330g
Doritos Corn Chip Mexican Jalapeno 150g
Doritos Corn Chip Southern Chicken 150g
Doritos Corn Chips Cheese Supreme 170g
Doritos Corn Chips Nacho Cheese 170g
Doritos Corn Chips Original 170g
Doritos Mexicana 170g
Doritos Salsa Medium 300g
Doritos Salsa Mild 300g
French Fries Potato Chips 175g
Grain Waves Sweet Chilli 210g
Grain Waves Sour Cream&Chives 210G
GrnWves Plus Btroot & Chilli Jam 180g
Infuzions BBQ Rib Prawn Crackers 110g
Infuzions Mango Chutny Papadums 70g
Infuzions SourCream&Herbs Veg Strws 110g
Infuzions Thai SweetChili PotatoMix 110g
Infzns Crn Crnchers Tangy Gcamole 110g
Kettle 135g Swt Pot Sea Salt
Kettle Chilli 175g
Kettle Honey Soy Chicken 175g
Kettle Mozzarella Basil & Pesto 175g
Kettle Original 175g
Kettle Sea Salt And Vinegar 175g
Kettle Sensations BBQ&Maple 150g
Kettle Sensations Camembert & Fig 150g
Kettle Sensations Siracha Lime 150g
Kettle Sweet Chilli And Sour Cream 175g
Kettle Tortilla ChpsBtroot&Ricotta 150g
Kettle Tortilla ChpsFeta&Garlic 150g
Kettle Tortilla ChpsHny&Jlpno Chili 150g
Natural Chip Compny SeaSalt175g
Natural Chip Co Tmato Hrb&Spce 175g
Natural ChipCo Hony Soy Chckn175g
Natural ChipCo Sea Salt & Vinegr 175g
NCC Sour Cream & Garden Chives 175g
Old El Paso Salsa Dip Chnky Tom Ht300g
Old El Paso Salsa Dip Tomato Med 300g
Old El Paso Salsa Dip Tomato Mild 300g
Pringles Barbeque 134g
Pringles Chicken Salt Crips 134g
Pringles Mystery Flavour 134g
Pringles Original Crisps 134g
Pringles Slt Vingar 134g
Pringles SourCream Onion 134g
Pringles Sthrn FriedChicken 134g
Pringles Sweet&Spcy BBQ 134g
Red Rock Deli Chikn&Garlic Aioli 150g
Red Rock Deli Sp Salt & Truffle 150G
Red Rock Deli SR Salsa & Mzzrlla 150g
Red Rock Deli Thai Chilli&Lime 150g
RRD Chilli& Coconut 150g
RRD Honey Soy Chicken 165g
RRD Lime & Pepper 165g
RRD Pc Sea Salt 165g
RRD Salt & Vinegar 165g
RRD SR Slow Rst Pork Belly 150g
RRD Steak & Chimuchurri 150g
RRD Sweet Chilli & Sour Cream 165g
Smith Crinkle Cut Bolognese 150g
Smith Crinkle Cut Mac N Cheese 150g
Smiths Chip Thinly Cut Original 175g
Smiths Chip Thinly CutSalt/Vinegr175g
Smiths Chip Thinly S/Cream&Onion 175g
Smiths Crinkle Original 330g
Smiths Crinkle Chips Salt & Vinegar 330g
Smiths Crinkle Cut Chips Barbecue 170g
Smiths Crinkle Cut Chips Chicken 170g
Smiths Crinkle Cut Chips Chs&Onion170g
Smiths Crinkle Cut Chips Original 170g
Smiths Crinkle Cut French OnionDip 150g
Smiths Crinkle Cut Salt & Vinegar 170g
Smiths Crinkle Cut Snag&Sauce 150g
Smiths Crinkle Cut Tomato Salsa 150g
Smiths Crnkle Chip Orgnl Big Bag 380g
Smiths Thinly Swt Chli&S/Cream175G
Smiths Thinly Cut Roast Chicken 175g
Snbts Whlgrn Crisps Cheddr&Mstrd 90g
Sunbites Whlegrn Crisps Frch/Onin 90g
Thins Chips Originl saltd 175g
Thins Chips Light& Tangy 175g
Thins Chips Salt & Vinegar 175g
Thins Chips Seasonedchicken 175g
Thins Potato Chips Hot & Spicy 175g
Tostitos Lightly Salted 175g
Tostitos Smoked Chipotle 175g
Tostitos Splash Of Lime 175g
Twisties Cheese 270g
Twisties Cheese Burger 250g
Twisties Chicken270g
Tyrrells Crisps Ched & Chives 165g
Tyrrells Crisps Lightly Salted 165g
Woolworths Cheese Rings 190g
Woolworths Medium Salsa 300g
Woolworths Mild Salsa 300g
WW Crinkle Cut Chicken 175g
WW Crinkle Cut Original 175g
WW D/Style Chip Sea Salt 200g
WW Original Corn Chips 200g
WW Original Stacked Chips 160g
WW Sour Cream &OnionStacked Chips 160g
WW Supreme Cheese Corn Chips 200g
[1] 1 2 3 4 5 200
[1] 1.50 1.70 1.80 1.90 2.10 2.30 2.40 2.60 2.70 2.80
[11] 2.90 3.00 3.10 3.25 3.30 3.40 3.60 3.70 3.80 3.90
[21] 4.20 4.30 4.40 4.50 4.60 4.80 5.10 5.20 5.40 5.60
[31] 5.70 5.80 5.90 6.00 6.20 6.30 6.50 6.60 6.80 6.90
[41] 7.20 7.40 7.50 7.60 7.80 8.10 8.40 8.50 8.60 8.70
[51] 8.80 9.00 9.20 9.30 9.50 9.60 9.75 9.90 10.20 10.40
[61] 10.50 10.80 11.10 11.20 11.40 11.50 11.60 11.70 11.80 12.00
[71] 12.40 12.60 12.90 13.00 13.20 13.50 13.80 14.00 14.40 14.50
[81] 14.80 15.00 15.20 15.30 15.50 15.60 16.20 16.25 16.50 16.80
[91] 17.10 17.20 17.60 17.70 18.00 18.40 18.50 19.00 19.50 20.40
[101] 21.00 21.50 21.60 22.00 22.80 23.00 23.60 25.50 27.00 28.50
[111] 29.50 650.00
[1] "MIDAGE SINGLES/COUPLES" "NEW FAMILIES" "OLDER FAMILIES"
[4] "OLDER SINGLES/COUPLES" "RETIREES" "YOUNG FAMILIES"
[7] "YOUNG SINGLES/COUPLES"
MIDAGE SINGLES/COUPLES
NEW FAMILIES
OLDER FAMILIES
OLDER SINGLES/COUPLES
RETIREES
YOUNG FAMILIES
YOUNG SINGLES/COUPLES
[1] "Budget" "Mainstream" "Premium"
Budget
Mainstream
Premium
# Remove outliers
q1 <- quantile(transaction_data$TOT_SALES, 0.25, na.rm = TRUE) # First quartile
q3 <- quantile(transaction_data$TOT_SALES, 0.75, na.rm = TRUE) # First quartile
IQR <- q3 - q1
lower_bound <- q1 - 1.5 * IQR
upper_bound <- q3 + 1.5 * IQR
transaction_data <- transaction_data[transaction_data$TOT_SALES <= upper_bound, ]
# Check for outliers in total sales
boxplot(transaction_data$TOT_SALES, main = "Total Sales Boxplot")Both histogram and boxplot show that the outliers have been removed from the total sales data and the total sales data now looks normally distributed. We now use numerical method to test our hypothesis.
# Shapiro-Wilk test for normality
sample_tot_sales <- transaction_data[sample(nrow(transaction_data), 1000), ]$TOT_SALES
shapiro.test(sample_tot_sales)
Shapiro-Wilk normality test
data: sample_tot_sales
W = 0.98153, p-value = 6.002e-10
The p-value is 7.682e-08 which is less than 0.05. Therefore, we reject the null hypothesis and conclude that the data is not normally distributed.
I will analyze the distribution of customers across different life stages.
# Sample data for young singles/couples
young_singles_couples <- merged_data[merged_data$LIFESTAGE == "YOUNG SINGLES/COUPLES", ]
young_singles_couples# Summary statistics for young singles/couples
summary(young_singles_couples) # Summary statistics for young singles/couples LYLTY_CARD_NBR DATE STORE_NBR TXN_ID
Min. : 1000 Min. :2018-07-01 Min. : 1.0 Min. : 1
1st Qu.: 65345 1st Qu.:2018-09-30 1st Qu.: 65.0 1st Qu.: 63089
Median : 133221 Median :2018-12-29 Median :133.0 Median :137478
Mean : 135616 Mean :2018-12-30 Mean :135.1 Mean :135184
3rd Qu.: 205375 3rd Qu.:2019-03-30 3rd Qu.:205.0 3rd Qu.:204443
Max. :2373711 Max. :2019-06-30 Max. :272.0 Max. :270205
PROD_NBR PROD_NAME PROD_QTY TOT_SALES
Min. : 1.00 Length:36321 Min. :1.000 Min. : 1.50
1st Qu.: 28.00 Class :character 1st Qu.:2.000 1st Qu.: 5.40
Median : 55.00 Mode :character Median :2.000 Median : 7.40
Mean : 56.19 Mean :1.828 Mean : 7.14
3rd Qu.: 84.00 3rd Qu.:2.000 3rd Qu.: 8.80
Max. :114.00 Max. :5.000 Max. :14.80
LIFESTAGE PREMIUM_CUSTOMER
Length:36321 Length:36321
Class :character Class :character
Mode :character Mode :character
'data.frame': 36321 obs. of 10 variables:
$ LYLTY_CARD_NBR : num 1000 1002 1007 1007 1010 ...
$ DATE : Date, format: "2018-10-17" "2018-09-16" ...
$ STORE_NBR : num 1 1 1 1 1 1 1 1 1 1 ...
$ TXN_ID : num 1 2 8 7 10 11 22 23 24 26 ...
$ PROD_NBR : num 5 58 10 49 51 59 3 97 38 19 ...
$ PROD_NAME : chr "Natural Chip Compny SeaSalt175g" "Red Rock Deli Chikn&Garlic Aioli 150g" "RRD SR Slow Rst Pork Belly 150g" "Infuzions SourCream&Herbs Veg Strws 110g" ...
$ PROD_QTY : num 2 1 1 1 2 1 1 1 1 1 ...
$ TOT_SALES : num 6 2.7 2.7 3.8 8.8 5.1 4.6 3 2.4 2.6 ...
$ LIFESTAGE : chr "YOUNG SINGLES/COUPLES" "YOUNG SINGLES/COUPLES" "YOUNG SINGLES/COUPLES" "YOUNG SINGLES/COUPLES" ...
$ PREMIUM_CUSTOMER: chr "Premium" "Mainstream" "Budget" "Budget" ...
[1] 259340
Looking at the summary statistics, we can see that their average
spent money is around $7.159, with an
average quantity of 1.832 products
purchased. The total sales under young singles/couples are
$260,405.3.
# Histogram of total sales for young singles/couples
hist(young_singles_couples$TOT_SALES, main = "Total Sales for Young Singles/Couples", xlab = "Total Sales ($)", ylab = "Frequency", col = "skyblue")The histogram shows that the total sales for young singles/couples might be normally distributed. We now use numerical method to test our hypothesis.
# Shapiro-Wilk test for normality
sample_tot_sales_young_SC <- young_singles_couples[sample(nrow(young_singles_couples), 1000), ]$TOT_SALES
shapiro.test(sample_tot_sales_young_SC)
Shapiro-Wilk normality test
data: sample_tot_sales_young_SC
W = 0.97553, p-value = 6.157e-12
The p-value is 1.22e-11 which is less than 0.05. Therefore, we reject the null hypothesis and conclude that the data is not normally distributed.
# Summarize total sales by product name
product_sales <- young_singles_couples %>%
group_by(PROD_NAME) %>%
summarise(Total_Sales = sum(TOT_SALES, na.rm = TRUE)) %>%
arrange(desc(Total_Sales)) # Optional: Sort by total sales in descending order
product_salesFrom this table, we can see the total sales for each product purchased by young singles and couples. The top 5 popular under this category are:
$5 655.0$5 192.0$5 119.2$4 930.5$4 839.3# Summarize total sales by premium status
premium_status_sales <- young_singles_couples %>%
group_by(PREMIUM_CUSTOMER) %>%
summarize(Total_sales = sum(TOT_SALES, na.rm = TRUE)) %>%
arrange(desc(Total_sales)) # Optional: Sort by total sales in descending order
premium_status_sales# Barplot for total sales by premium status
barplot(premium_status_sales$Total_sales, names.arg = premium_status_sales$PREMIUM_CUSTOMER, main = "Total Sales by Premium Status for Young Singles/Couples", xlab = "Premium Status", ylab = "Total Sales ($)", col = "skyblue")From this table, we can see the total sales for each premium status
purchased by young singles and couples. We can see that Mainstream have
the highest total sales of $156 882.0,
followed by Budget with $60 973.6 and then
Premium with $41 520.4. Now we investigate
top products in each premium status by total sales.
# Sample Mainstream Customers
mainstream_customers <- young_singles_couples[young_singles_couples$PREMIUM_CUSTOMER == "Mainstream", ]
# Summarize total sales by product name for mainstream customers
product_sales <- mainstream_customers %>%
group_by(PROD_NAME) %>%
summarise(Total_Sales = sum(TOT_SALES, na.rm = TRUE)) %>%
arrange(desc(Total_Sales)) # Optional: Sort by total sales in descending order
product_sales# Sample Mainstream Customers
budget_customers <- young_singles_couples[young_singles_couples$PREMIUM_CUSTOMER == "Budget", ]
# Summarize total sales by product name for mainstream customers
product_sales <- budget_customers %>%
group_by(PROD_NAME) %>%
summarise(Total_Sales = sum(TOT_SALES, na.rm = TRUE)) %>%
arrange(desc(Total_Sales)) # Optional: Sort by total sales in descending order
product_sales# Sample Mainstream Customers
premium_customers <- young_singles_couples[young_singles_couples$PREMIUM_CUSTOMER == "Premium", ]
# Summarize total sales by product name for mainstream customers
product_sales <- premium_customers %>%
group_by(PROD_NAME) %>%
summarise(Total_Sales = sum(TOT_SALES, na.rm = TRUE)) %>%
arrange(desc(Total_Sales)) # Optional: Sort by total sales in descending order
product_sales